Large-Scale and Site-Specific Mapping of the Murine Brain O-Glycoproteome with IMPa

Altered protein glycosylation is typically associated with cognitive defects and other phenotypes, but there is a lack of knowledge about the brain glycoproteome. Here, we used the newly available O-glycoprotease IMPa from Pseudomonas aeruginosa for comprehensive O-glycoproteomic analyses of the mouse brain. In this approach, total tryptic glycopeptides were prepared, extracted, purified, and conjugated to a solid support before an enzymatic cleavage by IMPa. O-glycopeptides were analyzed by electron-transfer/higher-energy collision dissociation (EThcD), which permits site-specific and global analysis of all types of O-glycans. We developed two complementary approaches for the analysis of the total O-glycoproteome using HEK293 cells and derivatives. The results demonstrated that IMPa and EThcD facilitate the confident localization of O-glycans on glycopeptides. We then applied these approaches to characterize the O-glycoproteome of the mouse brain, which revealed the high frequency of various sialylated O-glycans along with the unusual presence of the Tn antigen. Unexpectedly, the results demonstrated that glycoproteins in the brain O-glycoproteome only partly overlap with those reported for the brain N-glycoproteome. These approaches will aid in identifying the novel O-glycoproteomes of different cells and tissues and foster clinical and translational insights into the functions of protein O-glycosylation in the brain and other organs.


Cell lysis, protein extraction, and peptide purification
HEK293 cells were lysed in an ice-cold lysis buffer containing 50 mM HEPES pH=8.6, 0.5% sodium deoxycholate, 150 mM sodium chloride, and cOmplete protease inhibitor cocktail EDTA-free (Roche).The cells were sonicated three times with a Qsonica sonicator for 10 s each at 20 A with 30 s break between rounds, followed by incubation at 4°C with end-over-end rotation for 45 min.Brain tissues were homogenized with a Dounce homogenizer on ice before the sonication.The lysates from both HEK cells and mouse brain tissues were centrifuged at 3220g for 10 min and the debris were discarded.The protein concentration was measured with BCA assay.The proteins were reduced with 5 mM dithiothreitol at 56°C, 30 min and subsequently alkylated with 14 mM iodoacetamide at room temperature in the dark for 30 min. 2 The proteins were purified with the methanol/chloroform precipitation method.The pellets were dried and resuspended in a digestion buffer containing 50 mM HEPES pH=8.6, 1.6 M urea, and 5% acetonitrile (ACN).The proteins were digested with sequencing grade modified trypsin (Promega) at the protein:enzyme ratio of 100:1 at 37°C with shaking overnight.

Enzymatic release of O-glycopeptides with IMPa
Methods for glycopeptide extraction and immobilization were adapted and modified from Yang et al. [3][4] After protein digestion, the mixture was acidified with trifluoroacetic acid (TFA) to pH<2, clarified by centrifugation, and desalted with C18 Sep-Pak Vac cartridges (Waters).Before the elution step, the columns were also loaded with a solution containing 0.95 M ammonium hydroxide, 0.2 M Omethylisourea hemisulfate salt, and 0.03% TFA, and incubated at 65°C for 20 min to convert lysine into homoarginine.This prevents lysine from being conjugated to AminoLink beads during the reductive amination reaction and thus the glycopeptides can be released after IMPa cleavage.The eluted peptides were dried by lyophilization.For total O-glycoprotein experiments, the peptides were resuspended in 95% ACN with 1% TFA.The glycopeptides were enriched with HyperSep™ Retain AX Cartridges (RAX, Thermo).6][7] However, the cartridge consists of a polystyrenedivinylbenzene polymer functionalized with a quaternary amine group.Similar to previous studies [3][4]8 , RAX cartridges were conditioned with 1) ACN, 2) 100 mM pH=7 triethylammonium acetate, 3) deionized water, and 4) 95% ACN/1% TFA. Te peptides were then loaded and washed with 95% ACN/1% TFA.Eventually, enriched glycopeptides were eluted with 50% ACN/0.1% TFA.For Tn-focused experiments, the peptides were resuspended in PBS and enriched with agarose-bound VVA (Vector) overnight at room temperature according to the manufacturer's protocol.The glycopeptides were eluted by incubating with a solution containing 4 M urea, 100 mM Tris-HCl pH=7.4,and 400 mM N-acetyl-D-galactosamine (Carbosynth) for 30 min at room temperature with shaking.The eluted glycopeptides were desalted with C18 Sep-Pak Vac cartridges.Glycopeptides from both approaches were dried via lyophilization.
The resulting glycopeptides were resuspended in PBS and the pH was adjusted to ~7.The peptide amount was determined with a nanodrop spectrophotometer.The peptides were conjugated to AminoLink™ Plus Coupling Resin (Thermo) at the peptide:bead ratio of 1 µg:1 µL in the presence of 50 mM sodium cyanoborohydride with end-over-end rotation at room temperature overnight.The beads were subsequently blocked by a reaction with 1 M Tris-HCl pH=7.5 with 50 mM sodium cyanoborohydride for 30 min at room temperature with end-ever-end rotation.The beads were washed four times with 1) 60% ACN and 0.1% TFA, 2) 1.5 M sodium chloride, and 3) 20 mM Tris-HCl pH=7.5.The beads were resuspended in 20 mM Tris-HCl pH=7.5.O-glycopeptides were released by IMPa O-glycoprotease (New England BioLabs) at the peptide:enzyme ratio of 10 µg:1 µL.The reaction took place at 37 o C with endover-end rotation overnight.Next day, the supernatant was collected.The beads were washed twice with 0.1% TFA and pooled with the supernatant.The pooled eluate was desalted with a C18 Sep-Pak cartridge and dried with lyophilization.

High-pH peptide fractionation
For the large-scale mouse brain experiment, the dried glycopeptides were fractionated with high-pH reversed-phase using a Shimadzu 20A HPLC system.Peptides were fractionated using an XBridge C18 3.5 µm, 4.6x250 mm column (Waters).Buffer A was 10 mM ammonium formate pH=10 in water and buffer B was 10 mM ammonium formate pH=10 in 90% ACN.The peptides were fractionated and collected during a 50-minute gradient of 5% to 35% buffer B and the flow rate of 0.7 mL/minute.The peptides were collected every 2 minutes, consolidated into 12 fractions, and dried with lyophilization.The peptides were desalted with a C18 Sep-Pak cartridge and dried with lyophilization.

LC-MS/MS analysis
Peptide sequencing was performed on a Dionex UltiMate 3000 UHPLC system coupled to an Orbitrap Fusion Lumos Tribrid mass spectrometer (Thermo).The glycopeptides were dissolved in a solution containing 0.1% formic acid (FA) in water and loaded onto a C18 precolumn (C18 PepMap 100, 300 µm × 5 mm, 5 µm, 100 Å, Thermo) with 15 µL/min solvent A (0.1% FA in water) for 3 min, and separated on a C18 analytical column (PicoFrit 75 µm ID × 150 mm, 3 µm, New Objective) using a gradient of 2-40% solvent B (80% ACN and 0.1% FA in water) over 95 min, followed by a gradient of 40-90% over 3 min.The ion source voltage was 2100 V.The ion transfer tube temperature was 275°C.Higher-energy dissociation product ions-triggered electron-transfer/higher-energy collision dissociation (HCD-pd-EThcD MS2) or HCD MS2 alone was used for glycopeptide identification.With the HCD-pd-EThcD approach, the mass spectrometer was operated in a data-dependent acquisition mode where 15 most intense ions from MS1 were sequenced in MS2 with HCD, and those with specified oxonium ions in MS2 were also sequenced with EThcD at MS2 level.MS1 was performed in the Orbitrap at the resolution of 120,000.Depending on the experiment, the MS1 scan range was set to m/z 350-2000 or m/z 800-2000.Normalized AGC target was set 100% with the maximum ion injection time of 50 ms.The quadrupole was used to isolate precursor ions with the isolation window of 1.6 m/z.Dynamic exclusion was employed for 10 s with the mass tolerance of 10 ppm after 1 time.Stepped HCD with collision energies of 20%, 30%, and 40% was used to fragment the peptides in HCD MS2.MS2 detection was performed in the Orbitrap with the resolution of 30,000 and the first m/z of 100.The normalized AGC target was set to 100% with the maximum injection time of 250 ms.Targeted mass trigger was used after HCD MS2 where at least two ions with m/z=204.0865,168.0654, 186.0760, 274.0921, 292.1027, 366.1395, 126.0550, 138.0549, or 144.0655, with mass tolerance of 25 ppm and were one of the 20 most intense ions triggered another MS2 with EThcD. 9Calibrated charge-dependent ETD was enables with ETD supplemental energy of 25%.Ions were detected in the Orbitrap with a 30,000 resolution in the high mass range mode with the scan range from m/z=120-4000.The normalized AGC target was 200% and the maximum ion injection time was 200 ms.For test run experiments with HCD MS2, the mass spectrometer was operated in a data-dependent acquisition mode where 15 most intense ions from MS1 were sequenced in MS2 with HCD.MS1 was performed in the Orbitrap at the resolution of 60,000.MS1 scan range was set to m/z 400-1600.Normalized AGC target was set to 25% with the maximum ion injection time of 50 ms.The quadrupole was used to isolate precursor ions with the isolation window of 1.5 m/z.Dynamic exclusion was employed for 10 s with the mass tolerance of 20 ppm after 1 time.Stepped HCD with collision energies of 25%, 30%, and 35% was used to fragment the peptides in HCD MS2.MS2 detection was performed in the Orbitrap with the resolution of 30,000 and the first m/z of 100.The normalized AGC target was set to 400% with the maximum injection time of 250 ms.

Database searching and bioinformatics analysis
Raw files were searched using pGlyco3 10 against the human proteome database (downloaded on 10/05/2022, reviewed, containing both canonical and non-canonical sequences, 42,397 entries) or the mouse (Mus musculus) proteome database (downloaded on 06/09/2021, reviewed, containing both canonical and non-canonical sequences, 25,368 entries) from UniProt.Trypsin was set as an enzyme with the specificity set to semi.Maximum number of missed cleavages was 2. Carbamidomethylation of cysteine (+57.0215Da) and guanidination of lysine (+42.0218Da) were set as static protein modifications.Oxidation of methionine (+15.9949Da) was set as a variable protein modification.Maximum variable modification was 2. Peptide length was from 6 to 40.The glycan database used was the default "Multi-Site-O-Glycan".Precursor ion mass tolerance was 10 ppm and fragment ion mass tolerance was 20 ppm.The glycopeptide false discovery rate (FDR) was 1%.Identified glycopeptides were inspected and filtered manually.In general, we checked for the presence of c-and z-ions and the localization of the O-glycan was confirmed.Glycan composition was also validated by determining the glycan losses from the precursor ions.Other spectra were removed from the final results.Additionally, we confirmed the presence of oxonium ions at m/z=204 and, in some cases, m/z=292 and 274 for sialic acid.Only those without the N!P[S/T/C] sequon for N-linked glycosylation were retained.The 138/144 ratios were extracted with pGlycoQuant. 11The majority of the identified and filtered GSMs have the ratio <3, indicating the high number of O-glycopeptides in the datasets.For total O-glycoproteins experiments, the identification was classified into type 1 for those with a localized site with supporting fragment ions or with a non-zero localization score.The peptides must also have a terminal serine or threonine when IMPa was used.The rest were classified as type 2. For Tn-focused experiments, type 1 were those with HexNAc(1) as the modified glycan with supporting fragment ions with a non-zero localization score, type 2 were those identified by HCD fragmentation only but modified with HexNAc(1) with a terminal serine or threonine, and type 3 were the rest.Glycosylation types can be found in the supplemental tables.Tissue enrichment analysis was performed with FUMA. 12N-glycosylation prediction was analyzed with NetNGlyc. 135] Sequence logos were generated with pLogo. 16Figures were created with BioRender.com.OriginLab was used to create Fig. 2. We define the glycospecies as followed: glycoprotein groups, analogous to protein groups 17 , refers to glycoproteins that were inferred from the glycopeptide.This could be the canonical form if it does not have any isoforms, or multiple isoforms if the identified glycopeptide cannot be used to distinguish between them, or a specific isoform if the identified glycopeptide can be used to distinguish between the isoforms.All were counted as one glycoprotein group.; glycoproteins refers to the individual glycoproteins from the glycoprotein groups.For example, if there are three possible isoforms that cannot be distinguish from one glycoprotein group, the glycoprotein count is three.Note that, it is likely that only one glycoforms was detected but cannot be distinguished among others.This might lead to an inflated number of glycoproteins reported; all glycopeptides (in the Tn glycoproteomics section) refers to all non-redundant glycopeptides with the Tn antigen or with other O-glycans; GSMs refers to glycopeptide-spectrum matches, similar to peptide-spectrum matches for nonglycosylated peptides; and glycoforms refers to the specific O-glycan on a particular glycosylation site.For example, if three O-glycans were detected on one glycosylation site, the number of glycoforms is considered as three.

N-glycoproteomics analysis
To analyze N-and O-glycoproteins from the same tissue, the proteins from mouse brain were extracted and digested similarly.About 5 mg of proteins was used for N-glycoproteomics analysis and the rest was for O-glycoproteomics analysis as described above.A combination of six agarose-bound lectins, including 250 μL Concanavalin A (ConA), 80 μL Ricinus communis agglutinin I (RCA), 80 μL Sambucus nigra agglutinin (SNA), 90 μL Aleuria aurantia lectin, 120 μL wheat germ agglutinin (WGA), and 80 μL Vicia villosa agglutinin (Vector Laboratories), were used for the enrichment.The dried and purified peptides were dissolved in a binding buffer containing 50 mM HEPES, 1 mM MnCl2, 1 mM MgCl2, and 1 mM CaCl2, and incubated for 1 hour at 37°C with end-over-end rotation.The beads were washed 5 times with the binding buffer.Enriched glycopeptides were eluted by incubation with 1) 200 mM methyl α-Dmannopyranoside, 200 mM methyl α-D-glucopyranoside, 200 mM D-(+)-galactose, 100 mM L-(-)-fucose, 500 mM N-acetyl-D-glucosamine, and 200 mM N-acetyl-D-galactosamine (Carbosynth), 2) 500 mM Dlactose (Fisher), and 3) 500 mM D-lactose containing 200 mM acetic acid (Chem-Impex).Each elution was incubated for 15 min with shaking at room temperature.The eluted glycopeptides were combined, desalted with a C18 Sep-Pak cartridge, and dried with lyophilization.N-glycopeptides were analyzed with LC-MS/MS using the gradient similar to the method for O-glycoproteomics analysis.MS1 was performed in the Orbitrap at the resolution of 120,000.MS1 scan range was set to m/z 800-2000.Normalized AGC target was set to 100% with the maximum ion injection time of 50 ms.The quadrupole was used to isolate precursor ions with the isolation window of 1.5 m/z.Dynamic exclusion was employed for 10 s with the mass tolerance of 20 ppm after 1 time.Stepped HCD with collision energies of 20%, 30%, and 40% was used to fragment the peptides in HCD MS2.MS2 detection was performed in the Orbitrap with the resolution of 30,000 and the first m/z of 100.The normalized AGC target was set to 100% with the maximum injection time of 250 ms.Raw files were searched using pGlyco 3. The same FASTA file from mouse was used as above.Trypsin was set as an enzyme with the specificity set to specific.Maximum number of missed cleavages was 2. Carbamidomethylation of cysteine (+57.0215Da) was set as a static protein modification.Oxidation of methionine (+15.9949Da) was set as a variable protein modification.Maximum variable modification was 2. Peptide length was from 6 to 40.The glycan database used was the default "pGlyco-N-Mouse".Precursor ion mass tolerance was 10 ppm and fragment ion mass tolerance was 20 ppm.The glycopeptide false discovery rate (FDR) was 1%.Identified glycopeptides were inspected and filtered manually.Only those with the N!P[S/T] sequon for N-linked glycosylation were retained.   redicted O-glycosylation sites by NetOGlyc were shown in italic (all were predicted here). 1 See the full list in the Supplementary Information.Oglycans were identified from neurocan core protein (Ncan).Some glycans without a predicted structure were omitted from the figure.Predicted N-glycosylation sited from NetNGlyc were shown in red. 13Classes and clusters from the single-cell RNA-seq experiment was clustered into each brain region.Overall, the transcriptome is consistent with the types of O-glycans identified here and in our prior studies 23 , and suggest that glycan and glycoprotein expression within the brain may vary by region and cell type.While here we examined the whole mouse brain, future studies on regio-specific and cell-specific O-glycoproteome and O-glycan expression, and transcriptomics within the murine brain can addressed using the broad glycoproteomic information supplied here.S8.

Fig. S1 .
Fig. S1.Identification of Tn glycoproteins from SimpleCell HEK293 cells.Localization score of Tn glycosylation sites from VVA enrichment and from VVA enrichment combined with IMPA reaction.Both were from HCD-pd-EThcD experiments.

Fig. S2 .
Fig. S2.Large-scale O-glycoproteomics analysis of mouse brain.(A) Workflow for the identification of O-glycoproteins in mouse brain.Here, mouse brains were processed similar to that in Fig. 1.However, after O-glycopeptides were cleaved with IMPa, high-pH reverse-phase HPLC was used to fractionate the resulting peptides into 12 fractions.Each of the fraction was analyzed with LC-MS/MS.(B) Comparison of mouse brain O-glycoproteins from single-shot experiments (from Fig. 3A, both biological duplicated were pooled) and experiments with 2D fractionation from Fig. S1A.

Fig. S4 .
Fig.S4.Clustering analysis of O-glycoproteins from mouse brains.(A) Heatmap showing tissue expression of the identified glycoproteins using FUMA through GTEx v8 with 54 tissue types.The color key shows average of normalized expression per label (zero mean across samples).(B) Tissue specificity enrichment analyzed using FUMA through GTEx v8 with 54 tissue types.The tissues were ordered by up-regulated differentially expressed genes P-value.Those significantly enriched (PBon<0.05)were highlighted in red.(C) DAVID clustering based on cellular compartments, molecular functions, biological processes, and KEGG pathways of the identified glycoproteins.

Fig. S5 .
Fig. S5.Overlap of mouse brain N-and O-glycoproteins.Overlap of mouse brain O-glycoproteins with N-glycoproteins from normal human brains from intact N-glycoproteomics analysis (A), with deglycosylated N-glycoproteins from mouse brains of the same strain (B), or with intact N-glycoproteins from mouse brains of the same strains (C).

Fig. S6 .
Fig. S6.Identification of N-and O-glycoproteins from the same mouse brain.(A) Experimental scheme for simultaneous identification of N-and O-glycoproteins from the same mouse brain.Mouse brain was lysed and digested into peptides.N-glycopeptides were enriched with six lectin-agarose beads while O-glycopeptides were enriched with the method shown in this work.(B) Comparison of N and Oglycoproteins identified from (A). (C) 138/144 ratio of N-glycopeptides (left) and O-glycopeptides (right).

Fig. S7 .
Fig. S7.Comparison of O-GlcNAcylated and O-GalNAcylated glycosylation sites and glycoproteins.(A) Comparison with O-GlcNAcylated glycosylation sites and glycoproteins from mouse synapse. 18-19(B) Comparison with O-GlcNAcylated glycosylation sites and glycoproteins from mouse cerebrocortical brain tissue. 20(C) Comparison with O-GlcNAcylated glycosylation sites and glycoproteins from the O-GlcNAc database. 21Only data from Mus musculus were extracted.

Fig. S8 .
Fig. S8.O-glycosylation profile ofVcan.An example of site-specific O-glycopeptide identification.Oglycans were identified from versican core protein (Vcan).Some glycans without a predicted structure were omitted from the figure.Predicted N-glycosylation sited from NetNGlyc were shown in red.13Predicted O-glycosylation sites by NetOGlyc were shown in italic (all were predicted here).1 See the full list in the Supplementary Information.

Fig
Fig. S9.O-glycosylation profile ofNcan.An example of site-specific O-glycopeptide identification.Oglycans were identified from neurocan core protein (Ncan).Some glycans without a predicted structure were omitted from the figure.Predicted N-glycosylation sited from NetNGlyc were shown in red.13Predicted O-glycosylation sites by NetOGlyc were shown in italic (all were predicted except for sites 838 and 850).1 See the full list in the Supplementary Information.

Fig. S10 .
Fig. S10.RNA expression of glycosyltransferases involved in O-glycosylation.Hierarchical clustering of RNA abundance of glycosyltransferases involved in O-glycosylation from different parts of the mouse brain was extracted from DropViz.22 Classes and clusters from the single-cell RNA-seq experiment was clustered into each brain region.Overall, the transcriptome is consistent with the types of O-glycans identified here and in our prior studies23 , and suggest that glycan and glycoprotein expression within the brain may vary by region and cell type.While here we examined the whole mouse brain, future studies on regio-specific and cell-specific O-glycoproteome and O-glycan expression, and transcriptomics within the murine brain can addressed using the broad glycoproteomic information supplied here.

Fig. S11 .
Fig. S11.Sequence logo of O-glycosylation sites.Full logo of identified O-glycosylation sites with the sites centered at the 0 position.This figure shows the overall enrichment.The full enrichment result was shown in TableS8.